feat(network): IPv6/dual-stack monitors + gateway wiring (Node Doctor TaskForge feature 1125)#22
Conversation
Detection-only monitor (type network-ipv6-sysctl) reading
/proc/sys/net/ipv6/conf/{all,default,<iface>}/disable_ipv6 and emitting
IPv6SysctlMisconfigured when IPv6 is disabled but expected enabled.
Self-registers via init(); cmd wiring owned by #17209.
Adds table-driven unit tests (94.5% coverage) with t.TempDir() /proc
fixtures plus static testdata fixtures.
…(Task #17239) Wire the landed detectDefaultIPv6Gateway parser into the gateway monitor. New config key addressFamily (ipv4|ipv6|auto, default ipv4 preserves prior behavior); getGatewayIP now returns the probed family and records it on types.GatewayLatency.AddressFamily. auto prefers IPv4, falls back to IPv6. Prometheus address_family label deferred to #17216. Note: task's stated filesAffected (pkg/metrics/latency.go, pkg/events/ network.go) do not exist; GatewayLatency lives in pkg/types/types.go.
…#17206) New detection-only monitor (type network-ipv6-route) that reuses the landed detectDefaultIPv6GatewayFromFile parser to verify an IPv6 default route is present when expected, emitting IPv6DefaultRouteMissing. Distinguishes unreadable route table (warning, IPv6 may be absent) from genuinely-absent default route. Self-registers via init(); cmd wiring owned by #17209. Unit tests at 95.9% coverage.
…sk #17207) New /proc-based detection-only monitor (type network-ipv6-neighbor) parsing /proc/net/if_inet6 (link-local/global address presence) and accept_ra/autoconf sysctls. Emits IPv6LinkLocalMissing, IPv6GlobalAddressMissing, IPv6RouterAdvertisementDisabled. No new deps; netlink NDP neighbor-cache documented as out-of-scope follow-up. Self-registers via init() (#17209 owns cmd wiring). 93.2% coverage.
DETECTION-ONLY monitor (type network-ipv6-firewall) that lists ip6tables (-S) or nft (list ruleset) read-only and flags IPv6FirewallBlackhole when all built-in filter chains default-DROP with no ACCEPT anywhere. backend config auto|ip6tables|nft. Missing binary / read error -> warning, never an error; never issues a mutating firewall verb. Injectable command executor for tests; 94.9% coverage. Self-registers (#17209 owns cmd).
The four IPv6 monitors (network-ipv6-{sysctl,route,neighbor,firewall})
self-register via the blank import of pkg/monitors/network and are
auto-enabled by ApplyDefaultMonitors since each provides a DefaultConfig
— no per-monitor wiring needed in cmd. This test pins that contract
(registration + default-application) so a future regression fails loudly.
Shipped-config additions are owned by #17228.
| } | ||
|
|
||
| func (e *defaultCommandExecutor) Run(ctx context.Context, name string, args ...string) ([]byte, error) { | ||
| cmd := exec.CommandContext(ctx, name, args...) |
| // Scope 0x20 = link-local, 0x00 = global. Parse errors on individual lines are | ||
| // skipped; a read/open failure is returned to the caller. | ||
| func parseIfInet6File(path string) ([]ipv6Address, error) { | ||
| f, err := os.Open(path) |
| // readSysctlInt reads a sysctl-style file and returns its integer value (after | ||
| // trimming whitespace). Used for accept_ra / autoconf, which take values 0/1/2. | ||
| func readSysctlInt(path string) (int, error) { | ||
| data, err := os.ReadFile(path) |
| // (trimmed of whitespace) is "1". Any other value is treated as false. Errors | ||
| // are propagated. | ||
| func readSysctlBool(path string) (bool, error) { | ||
| data, err := os.ReadFile(path) |
…(Task #17203) Bind Prometheus exporter and health server to :: by default (accepts IPv4+IPv6 when bindv6only=0), with graceful fallback to 0.0.0.0 when the IPv6/dual-stack bind fails (nodes may have IPv6 disabled). Exporter bind address is now configurable via prometheus.bindAddress (default ::); replaced hardcoded 0.0.0.0 sprintf with net.JoinHostPort. Extracted listenWithFallback/isDualStackHost helpers (unit-tested). Server inherits caller ctx via BaseContext. cmd/main.go health server default -> ::.
…204) When a kubelet healthz/metrics probe to a loopback host fails with a connection-level (dial) error, retry once against the opposite-family loopback ([::1] <-> 127.0.0.1) preserving scheme/port/path/query/auth. Only loopback hosts are rewritten; HTTP 4xx/5xx and deadline timeouts do not trigger fallback. Shared seam doRequestWithLoopbackFallback covers both healthz and metrics. Helpers unit-tested; ::1 bind test skips when IPv6 loopback unavailable.
…17233)
resolveTarget now parses fe80::1%eth0 (split on last %, retains zone) and
returns the zone; singlePing sends via net.IPAddr{IP,Zone} so link-local
pings carry the required scope. Also fix reply matching to compare IP
bytes (IP.Equal) instead of String(), so zoned replies (fe80::1%eth0)
match the zoneless target. Family stays ipv6 for link-local. Tests cover
zone parsing, destAddr threading, and zone-tolerant peer matching.
…234) Pre-resolve hostname targets via the injectable Resolver (LookupIP), select an address (IPv4-preferred, IPv6 if no v4) to set PingResult.Family truthfully, and pin the dial to that IP via a per-target Transport DialContext (original Host header preserved for TLS/vhost). Resolution failure falls back to the original unpinned probe with empty family. IP literals unchanged (resolver not consulted). Tests use a fake resolver.
…#17235) Install ipv6.ICMPFilter accepting only EchoReply on the v6 listener (non-fatal if unsupported; receive loop still filters). Replace the process-global ICMP id with a per-pinger-instance id (pid mixed with an atomic counter, masked to 16 bits, non-zero) so concurrent pingers don't cross-match replies. Live-socket integration test stays gated by testing.Short(). Adds id-uniqueness unit test.
| // is false. The label argument ("healthz" or "metrics") is used only for | ||
| // logging. | ||
| func (c *defaultKubeletClient) doRequestWithLoopbackFallback(req *http.Request, label string) (resp *http.Response, usedFallback bool, err error) { | ||
| resp, err = c.client.Do(req) |
|
|
||
| // Rebuild the request against the alternate loopback, preserving method, | ||
| // context, and headers (including any auth header already applied). | ||
| fbReq, buildErr := http.NewRequestWithContext(req.Context(), req.Method, fallbackURL, nil) |
| log.Printf("[INFO] kubelet %s: loopback probe to %s failed (%v), retrying %s", | ||
| label, req.URL.Host, primaryErr, fbReq.URL.Host) | ||
|
|
||
| resp, err = c.client.Do(fbReq) |
| // the value is deterministic and unique per instance (no randomness required). | ||
| func nextPingerID() uint16 { | ||
| n := atomic.AddUint32(&pingerInstanceCounter, 1) | ||
| id := uint16(os.Getpid()) + uint16(n) |
| // the value is deterministic and unique per instance (no randomness required). | ||
| func nextPingerID() uint16 { | ||
| n := atomic.AddUint32(&pingerInstanceCounter, 1) | ||
| id := uint16(os.Getpid()) + uint16(n) |
| id := uint16(os.Getpid()) + uint16(n) | ||
| if id == 0 { | ||
| // Avoid an all-zero ID, which is a poor discriminator on the wire. | ||
| id = uint16(n) | 0x8000 |
| } | ||
| fbReq.Header = req.Header.Clone() | ||
|
|
||
| log.Printf("[INFO] kubelet %s: loopback probe to %s failed (%v), retrying %s", |
…ing (Task #17236) Add NODE_DOCTOR_ICMP_INTEGRATION env gate: when set (CI), socket/ permission failures in TestDefaultPinger_Integration become hard failures instead of silent t.Skip; -short still skips, unset+non-short still skips gracefully on unprivileged dev boxes. New Makefile target test-net-icmp-integration and a separate (non-blocking) CI job compile the test binary as the runner user and run it under sudo for CAP_NET_RAW.
Add TestResolveTarget_HostnameDNSPath (localhost via /etc/hosts, no network) exercising the net.ParseIP-fails -> net.LookupIP branch with loopback/family/zone assertions, plus TestResolveTarget_HostnameResolution Failure using an RFC 6761 .invalid name for the error branch. Test-only.
Both default-route parsers now scan all default routes and select the lowest-metric one (kernel behavior) instead of first-seen. IPv4 metric = /proc/net/route col 6 decimal; IPv6 metric = /proc/net/ipv6_route col 5 hex. Tie -> first-seen; malformed metric -> treated as max (line still eligible if sole default). New dedicated multi-default fixtures (shared ipv6_route fixture untouched to protect #17239 tests).
…241) When a custom DNS query is RecordType AAAA with TestEachNameserver and/or ConsistencyCheck enabled (both A-only), emit a single Warning event (AAAAFeatureUnsupported) per query naming the skipped feature(s) and domain, instead of silently dropping them. AAAA lookup behavior and the A-query path are unchanged. Tests cover each flag, both, neither, and A.
Add TestCheckCustomQueries_AAAAScopedAddresses asserting the AAAA custom- query path treats scoped/non-global IPv6 results (fe80::/10 link-local, fc00::/7 ULA, ::1 loopback, and a scoped+global mix) as a successful lookup emitting no error/no-records events. Test-only.
…(Task #17243) Add TestCheckCustomQueries_AAAARecordTypeInEventMessage asserting the failure, no-records, and high-latency events for AAAA custom queries include the "AAAA" record-type string in their message. Test-only.
The #17202 carry-forward made PeerStatus.Family monotonic-sticky: a fresh probe resolving a different family (peer re-IP v4<->v6, overlay-test toggle) was ignored. Now the most recent check's resolved family wins; the prior family is retained only when a check resolves no family at all (transient failure / pinger error early-return), preserving #17202 behavior. Tests cover both re-IP directions + carry-forward boundary.
…s (Task #17232) Root cause: the test started the controller server on hardcoded 0.0.0.0: 8080 only to run the correlator's background loop; the bind collided with a port already in use, and two subtests ignored the Start() error so the correlator loop never ran -> no correlations -> misleading assertion failure. Fix (test-only): use an ephemeral port (Port=0), assert Start() succeeds in every subtest, and replace fixed sleeps with the Eventually poll helper. Assertions unchanged.
Add address_family label (ipv4|ipv6|unknown) to GatewayLatencySeconds,
Peer{Latency,LatencyAvg,Reachable}, and DNSLatencySeconds. Add
AddressFamily to types.{PeerLatency,DNSLatency,APIServerLatency}
(GatewayLatency already had it) and populate from PeerStatus.Family,
DNS RecordType (AAAA->ipv6), and the classified API-server endpoint.
familyLabel() normalizes empty/unexpected -> unknown so the label is
never empty. No import cycle (monitors use existing family consts;
types untouched re imports). Tests for label emission + classifier.
…ls (Task #17217) Add an address_family query template variable (multi/includeAll) to the CNI network-health dashboard and apply address_family=~"$address_family" to the peer latency/reachability panels, grouping the per-node timeseries by (node, address_family) so IPv4 vs IPv6 series are distinguishable. The overview/system/kubernetes dashboards have no relabeled-metric panels and are unchanged.
…k #17218)
Add dashboards/node-doctor-ipv6.json (29 panels): IPv6 monitor condition
health (condition_status{condition_type=~"IPv6.*"} for the six new
conditions), condition activity over time, dual-stack connectivity
(peer/gateway latency + reachability filtered by address_family, A-vs-AAAA
DNS), and IPv6 monitor-up/events. Mirrors the CNI dashboard schema/
templating; all PromQL uses verified metric+label names.
… (Task #17219)
Rewrite NodeDoctorHighPeerLatency and NodeDoctorLowPeerConnectivity to
aggregate by (node, address_family) using the family-labeled peer gauges,
so IPv4/IPv6 alert independently and the firing alert carries
address_family. Add NodeDoctorIPv6Misconfigured firing on
condition_status{condition_type=~"IPv6.*"}==1 (for 10m, address_family=
ipv6 label), gated by prometheusRule.warning.ipv6Misconfigured.enabled
(default true). Mirror the two rewrites + new alert into the static
deployment/prometheusrule.yaml. helm template renders; YAML valid.
…Task #17210) NewRegistry (the registry the exporter serves) already registers the Go runtime and process collectors. Add TestNewRegistry_IncludesGoAndProcess Collectors asserting go_goroutines and (on linux) process_* / process_start_time_seconds are exposed, so the runtime/process self- metrics can't silently regress. Test-only; functionality already present.
Add monitor_cycles_total{node,monitor_name,result} and
monitor_cycle_last_timestamp_seconds{node,monitor_name} (last-run
heartbeat), recorded once per cycle in ExportStatus via new
RecordMonitorCycle. Also wire the previously-defined-but-unobserved
monitor_check_duration_seconds histogram. result=error when the status
carries any ConditionFalse. Tests for the recorder + ExportStatus path.
Add exporter_healthy, exporter_last_success_timestamp_seconds, and
exporter_consecutive_failures gauges ({node,exporter}), updated via a
recordExportHealth helper at the success/error branches of both
ExportStatus and ExportProblem. Consecutive failures tracked in a
mutex-guarded field mirrored to the gauge. Validation-failure branches
now also record ExportErrorsTotal (previously had no production site).
Tests cover success/failure/reset. Race-clean.
Expose the remediator circuit-breaker state as gauge
remediator_circuit_breaker_state{node} (0=closed,1=open,2=half-open) via
a push-observer: remediators defines CircuitStateObserver (int-based, no
prometheus import), the registry notifies it on every transition and on
SetCircuitStateObserver (current-state push); the prometheus exporter
implements ObserveCircuitState to set the gauge; main wires them when
both exist. No import cycle. Tests on both sides.
Add config_reloads_total{node,result}, config_reload_last_timestamp_
seconds, config_reload_last_success, config_reload_duration_seconds.
ReloadCoordinator gains an injected ReloadMetricsRecorder func (no
prometheus import); performReload uses a named return + deferred closure
so every return path (load/validation/callback error, no-op, success)
records exactly once. Exporter.RecordConfigReload implements it; wired in
main via det.SetReloadMetricsRecorder(promExporter.RecordConfigReload).
Tests on both sides; no import cycle.
…215) Add TestSelfMetricsRegistered: registers Metrics into NewRegistry (incl. go/process collectors), populates every self-metric via its recorder (RecordMonitorCycle/recordExportHealth/ObserveCircuitState/ RecordConfigReload + direct export-ops Inc/Observe), gathers, and asserts all 14 node_doctor_* self-metric families plus go_goroutines and (linux) process_start_time_seconds are present. Test-only.
Add NetworkFlushIPv6Route ("flush-ipv6-route") operation to
NetworkRemediator: runs ip -6 route flush cache, optional ip -6 route show
backup when BackupRouting. Unlike reset-routing, a flush failure is a hard
error (dedicated flush remediator). verifyOperation is a no-op (immediate).
Registration/enablement left to #17222. Tests assert exact command args,
failure semantics, backup ordering, and verify path.
Document and test that flushDNS is address-family agnostic: resolvectl flush-caches / systemd-resolve --flush-caches clear the full resolver cache (A and AAAA), so no separate IPv6 DNS-flush op is needed. New TestNetworkRemediator_FlushDNS_CoversIPv6 asserts the exact full-cache flush command for both backends and that no family/type-restricting flag (-4/-6/--type) is passed.
Add pinger_ipv6_integration_test.go (//go:build integration) exercising the real defaultPinger against ::1 and a discovered link-local target, asserting FamilyIPv6 classification; skips cleanly when IPv6 loopback or CAP_NET_RAW is unavailable. Complements the untagged platform-agnostic v6 unit tests (zone/family/destAddr/peer-match) in pinger_test.go. The -short unit tier never opens raw sockets.
…226) Add test/integration/testdata/kind-dualstack.yaml (networking.ipFamily: dual, dual pod/service subnets) and test/integration/ipv6_kind_test.go (//go:build integration): creates a uniquely-named dual-stack kind cluster, waits for node readiness, and asserts dual-stack — node PodCIDRs + InternalIPs carry both families and a RequireDualStack Service gets both an IPv4 and IPv6 ClusterIP (fails on single-stack). Skips cleanly when short/kind-missing/docker-down; t.Cleanup always deletes the cluster; isolated temp kubeconfig.
Add .github/workflows/ci-ipv6.yml with two jobs: ipv6-icmp (compile network tests with -tags=integration, run the v6 ICMP tests under sudo + NODE_DOCTOR_ICMP_INTEGRATION for CAP_NET_RAW) and ipv6-kind-dualstack (enable Docker daemon IPv6 with ip6tables+experimental — required for kind dual-stack — install pinned kind, run TestIPv6KindDualStackCluster with 20m timeout + always-cleanup). Path-filtered to IPv6-relevant dirs, workflow_dispatch, concurrency cancel. Action/Go versions match ci.yml.
…17228) Add explicit ipv6-sysctl-check, ipv6-route-check, ipv6-neighbor-check, and ipv6-firewall-check entries (matching each monitor's DefaultConfig) to config/node-doctor.yaml, with a note that they are detection-only and degrade gracefully on IPv4-only nodes (set expectIPv6Enabled/enabled false to silence). Validated via --validate-config.
…7229) Add ipv6Sysctl/ipv6Route/ipv6Neighbor/ipv6Firewall monitor blocks to values.yaml(.template) and render them in configmap.yaml (each gated by its .enabled), using procPath /host/proc (host fs bind-mount). Switch exporters.http.bindAddress default 0.0.0.0 -> :: (dual-stack; server falls back to 0.0.0.0 when IPv6 is disabled). helm template renders; rendered config contains the four network-ipv6-* types and bindAddress "::".
Document the four detection-only IPv6 monitors (ipv6Sysctl/Route/Neighbor/ Firewall), the dual-stack :: bindAddress default with IPv4 fallback, how they degrade gracefully on IPv4-only nodes (and how to disable them), and the address_family metric label + IPv6 PrometheusRule alerts.
… (Task #17231) configuration.md: IPv6/dual-stack subsection (4 detection monitors with config keys, gateway addressFamily, DNS recordType: AAAA, :: bind + fallback, IPv4-only degradation/disable). monitors.md: section + TOC for the four network-ipv6-* monitors (checks, keys, conditions) + address_ family metric label. remediation.md: flush-ipv6-route operation (ip -6 route flush cache) and family-agnostic flush-dns note. Verified all type strings/conditions/keys against source; did not claim a registered ipv6-route-flush remediator TYPE (that registration is blocked, #19263).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f0362a84f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if cond.Status == types.ConditionFalse { | ||
| return true |
There was a problem hiding this comment.
Treat active problem conditions as failed cycles
Several monitors use problem-style conditions where True means the problem is active and False means it is cleared (for example the new IPv6 monitors emit IPv6DefaultRouteMissing=True/IPv6FirewallBlackhole=True, and existing NetworkPartitioned=True follows the same pattern). With this check, those active failures are recorded as monitor_cycles_total{result="success"}, while healthy/cleared False states are recorded as errors, so the new monitor-cycle self-metrics are inverted for these monitors.
Useful? React with 👍 / 👎.
| // One ExportStatus call corresponds to one completed monitor check cycle: | ||
| // the monitor ran its check and emitted a status, which the detector forwards | ||
| // here. Time the cycle and record self-metrics at the end via RecordMonitorCycle. | ||
| cycleStart := time.Now() |
There was a problem hiding this comment.
Measure check duration before exporting status
ExportStatus is called from processStatus only after the monitor check has already completed and its Status has been received, so starting cycleStart here measures Prometheus export processing time rather than the monitor check duration. Slow monitor executions will therefore look fast in monitor_check_duration_seconds, making the new self-metric misleading for latency alerts and dashboards.
Useful? React with 👍 / 👎.
| if err == nil { | ||
| return ln, nil |
There was a problem hiding this comment.
Bind IPv4 when :: succeeds as IPv6-only
On nodes with net.ipv6.bindv6only=1, binding [::]:port succeeds but accepts only IPv6 connections; because this branch returns immediately, the IPv4 fallback never starts. Since this commit changes the health/metrics defaults to ::, IPv4 Kubernetes probes or Prometheus scrapes fail on those hardened nodes even though the code promises a dual-stack bind with IPv4 fallback.
Useful? React with 👍 / 👎.
| ip := net.ParseIP(host) | ||
| return ip != nil && ip.To4() == nil |
There was a problem hiding this comment.
Limit IPv4 fallback to wildcard IPv6 binds
If an operator configures a concrete IPv6 bindAddress such as fd00::10, any bind error (address typo, address not assigned, etc.) is classified as dual-stack and retried on 0.0.0.0, silently exposing the metrics endpoint on every IPv4 interface instead of failing the bad explicit bind. The fallback should be restricted to the empty/unspecified wildcard address, not all IPv6 literals.
Useful? React with 👍 / 👎.
Autonomous sequential implementation of the Node Doctor IPv6 / dual-stack backlog (TaskForge project 41, features 1124–1130). Every commit verified green:
go build ./..., package tests (-short),go vet,gofmt; fullgo test ./... -shortgreen at each feature boundary.Scope completed (29 tasks)
network-ipv6-{sysctl,route,neighbor,firewall}— detection-only, self-registering, 93–95% coverage; registration verified in cmd.addressFamilyselection + metric-based default-route selection; pinger IPv6 link-local zone IDs, ICMPv6 filter + per-instance ID; httpPinger family pre-resolution; kubelet::1loopback fallback; DNS AAAA warnings; CNI family refresh-on-re-IP; dual-stack::bind for exporter/health server; fixed flakyTestCorrelationDetectionFlow.address_familylabel on network metrics; family-aware CNI dashboard + new 29-panel IPv6 dashboard; per-family + IPv6 PrometheusRule alerts.ip -6 route flush cacheoperation; verifiedflush-dnscovers AAAA.ci-ipv6.ymlworkflow (Docker IPv6 + kind).Not in this PR (tracked separately)
ipv6-route-flushremediator) is blocked by a discovered systemic gap: no built-in remediators are registered into the registry at all (registry.Remediate(strategy,…)fails for every type). Filed as node-doctor-236 / #19263 (high). Theflush-ipv6-routeoperation is implemented + tested.freePortTOCTOU in prometheus tests filed as node-doctor-235 / #19260.Per project policy this is not auto-merged.